Some Fitting of Naive Bayesian Spam Filtering for Japanese Environment

نویسندگان

  • Manabu Iwanaga
  • Toshihiro Tabata
  • Kouichi Sakurai
چکیده

Bayesian filtering is one of the most famous anti-spam measures. However, there is no standard implementation for treatment of Japanese emails by Bayesian filtering. In this paper, we compare several conceivable ways to treat Japanese emails about tokenizing and corpus separation. In addition, we give experimental results and some knowledge obtained by the experiments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalable Centralized Bayesian Spam Mitigation with Bogofilter

Bayesian content filters gained popular acclaim when they were put forward in 2002 by Paul Graham as a potential long-term solution for the spam problem. They have since fallen from the limelight, however, due to perceived attack vulnerabilities inherent to all content-based filters as well as real and imagined vulnerabilities specific to Bayesian filters. It has also been assumed that Bayesian...

متن کامل

Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach

We investigate the performance of two machine learning algorithms in the context of antispam filtering. The increasing volume of unsolicited bulk e-mail (spam) has generated a need for reliable anti-spam filters. Filters of this type have so far been based mostly on keyword patterns that are constructed by hand and perform poorly. The Naive Bayesian classifier has recently been suggested as an ...

متن کامل

An evaluation of Naive Bayesian anti-spam filtering

It has recently been argued that a Naive Bayesian classifier can be used to filter unsolicited bulk e-mail (“spam”). We conduct a thorough evaluation of this proposal on a corpus that we make publicly available, contributing towards standard benchmarks. At the same time we investigate the effect of attribute-set size, training-corpus size, lemmatization, and stop-lists on the filter’s performan...

متن کامل

The study on the spam filtering technology based on Bayesian algorithm

This paper analyzed spam filtering technology, carried out a detailed study of Naive Bayes algorithm, and proposed the improved Naive Bayesian mail filtering technology. Improvement can be seen in text selection as well as feature extraction. The general Bayesian text classification algorithm mostly takes information gain and cross-entropy algorithm in feature selection. Through the principle o...

متن کامل

PRIS Kidult Anti-SPAM Solution at the TREC 2005 Spam Track: Improving the Performance of Naive Bayes for Spam Detection

Recently, the spam already constituted a serious problem for both e-mail users and Internet Service Providers (ISP). Solutions to the abuse of spam would be both technical and legal regulatory. This paper reports our solution for the TREC 2005 spam track, in which we consider the use of Naive Bayes spam filter for its desirable properties (simplicity, low time and memory requirements, etc.). Th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004